Skip to content

Use internal Url struct in favor of url::Url to minimize the url dep in payjoin#1377

Merged
spacebear21 merged 3 commits intopayjoin:masterfrom
benalleng:url-rewrite
Apr 21, 2026
Merged

Use internal Url struct in favor of url::Url to minimize the url dep in payjoin#1377
spacebear21 merged 3 commits intopayjoin:masterfrom
benalleng:url-rewrite

Conversation

@benalleng
Copy link
Copy Markdown
Collaborator

@benalleng benalleng commented Mar 2, 2026

Closes #1124

This adds the internal Url Struct to no longer depend on the url crate directly for it.

However we still have reqwest inside payjoin url from payjoin-directory and test-utils which pull in the url dep, but that is only available on the io feature.

I have also added a fuzz target, which already helped me better shape the parse function better.

I have only ran the fuzzer for a limited amount of time however and some more cpu hours might be helpful

Claude really helped me get through the brunt of this.

You can test the lack of an accessible url dep not including the io feature with this command.

cargo tree -p payjoin --no-default-features --features v1,v2,directory,_manual-tls,_test-utils -e no-dev,no-build -i url
Pull Request Checklist

Please confirm the following before requesting review:

@coveralls
Copy link
Copy Markdown
Collaborator

coveralls commented Mar 2, 2026

Coverage Report for CI Build 24696334995

Coverage increased (+0.4%) to 84.904%

Details

  • Coverage increased (+0.4%) from the base build.
  • Patch coverage: 20 uncovered changes across 8 files (469 of 489 lines covered, 95.91%).
  • 2 coverage regressions across 2 files.

Uncovered Changes

File Changed Covered %
payjoin/src/core/url.rs 428 420 98.13%
payjoin-cli/src/app/v1.rs 5 1 20.0%
payjoin-cli/src/app/config.rs 6 4 66.67%
payjoin/src/core/io.rs 3 1 33.33%
payjoin-cli/src/app/v2/ohttp.rs 6 5 83.33%
payjoin/src/core/ohttp.rs 4 3 75.0%
payjoin/src/core/receive/error.rs 1 0 0.0%
payjoin/src/core/receive/optional_parameters.rs 11 10 90.91%

Coverage Regressions

2 previously-covered lines in 2 files lost coverage.

File Lines Losing Coverage Coverage
payjoin-cli/src/app/v1.rs 1 68.46%
payjoin/src/core/into_url.rs 1 96.23%

Coverage Stats

Coverage Status
Relevant Lines: 13361
Covered Lines: 11344
Line Coverage: 84.9%
Coverage Strength: 401.04 hits per line

💛 - Coveralls

@benalleng benalleng changed the title Url rewrite Remove url dep from payjoin crate* Mar 2, 2026
@benalleng benalleng changed the title Remove url dep from payjoin crate* Remove url dep from payjoin crate Mar 2, 2026
@benalleng benalleng changed the title Remove url dep from payjoin crate Use internal Url struct in favor of url::Url to minimize the url dep in payjoin Mar 2, 2026
@benalleng benalleng marked this pull request as draft March 2, 2026 23:31
@benalleng

This comment was marked as resolved.

@benalleng benalleng force-pushed the url-rewrite branch 2 times, most recently from 7c97b6c to 8e51345 Compare March 27, 2026 15:10
@benalleng benalleng marked this pull request as ready for review March 27, 2026 15:42
Copy link
Copy Markdown
Contributor

@DanGould DanGould left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can edit the spec to say "the host component of any URL in a BIP77 message MUST be an ASCII LDH (Letters, Digits, Hyphens) plus dots: [a-zA-Z0-9.-] hostname or an IPv4 address literal" to kill IDNA once and for all. For the full URL, we restrict the character set to RFC 3986 unreserved plus the structural delimiters we actually need (:/, ?, #, @). Reject any byte outside printable ASCII (0x21–0x7E) that isn't percent-encoded.

There's a specific URL fuzzer we might consider here: https://github.com/orangetw/Tiny-URL-Fuzzer https://github.com/orangetw/Tiny-URL-Fuzzer/blob/master/samples.txt could also test against

https://payjo.in\@evil.com/path
https://payjo.in%40evil.com/path
https://payjo.in#\@evil.com
https://payjo.in/../evil.com
https://payjo.in/%2e%2e/evil.com
https://evil.com:443@payjo.in/
https://payjo.in%00.evil.com/path
https://payjo.in:@evil.com/

not that these are reall attacks but just so we knwo we parse the same as url for stuff we could actually encounter. Low prio but I wanted to document.

Comment thread payjoin/src/core/url.rs Outdated
Comment thread payjoin/src/core/url.rs Outdated
}

#[derive(Debug, Clone, PartialEq, Eq)]
#[allow(dead_code)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self that this is removed by the time the PR's last commit comes around.

Copy link
Copy Markdown
Collaborator Author

@benalleng benalleng Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want support for ipv4 or ipv6 hosts in payjoin::Url if not we can just keep Host::Domain.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for testing especially this is still useful. What do you think?

Comment thread payjoin/src/core/receive/v2/mod.rs Outdated
Comment thread payjoin-cli/Cargo.toml Outdated
@DanGould
Copy link
Copy Markdown
Contributor

Tnull says "Def. no huge blocker if there is a path to dropping it!" but this seems pretty darn close so I'd like to close it out to keep the prs flowing

@benalleng benalleng force-pushed the url-rewrite branch 12 times, most recently from 652f4a7 to 4fca182 Compare April 14, 2026 20:15
@benalleng benalleng force-pushed the url-rewrite branch 3 times, most recently from 96f3353 to b3bcf75 Compare April 15, 2026 13:42
Copy link
Copy Markdown
Contributor

@xstoicunicornx xstoicunicornx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a bit of feedback.

One other thing, seems like we are replicating the url::form_urlencoded::parse method in two different places so maybe its worth creating a native implementation of this in url.rs?

Comment thread payjoin/src/core/url.rs Outdated
Comment thread payjoin/src/core/url.rs Outdated
Comment thread payjoin/src/core/url.rs Outdated
scheme.push(c);
}
':' => break,
_ => return Err(ParseError::InvalidCharacter),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just use ParseError::InvalidScheme instead, as this is more consistent with how parse_port errors are handled

Comment thread payjoin/src/core/url.rs Outdated
Comment on lines +438 to +457
let mut path = String::new();
let mut query: Option<String> = None;
let mut fragment: Option<String> = None;

if let Some(frag_pos) = input.find('#') {
let before_fragment = &input[..frag_pos];
fragment = Some(input[frag_pos + 1..].to_string());

if let Some(q_pos) = before_fragment.find('?') {
path.push_str(&before_fragment[..q_pos]);
query = Some(before_fragment[q_pos + 1..].to_string());
} else {
path.push_str(before_fragment);
}
} else if let Some(q_pos) = input.find('?') {
path.push_str(&input[..q_pos]);
query = Some(input[q_pos + 1..].to_string());
} else {
path.push_str(input);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing to follow why not just:

Suggested change
let mut path = String::new();
let mut query: Option<String> = None;
let mut fragment: Option<String> = None;
if let Some(frag_pos) = input.find('#') {
let before_fragment = &input[..frag_pos];
fragment = Some(input[frag_pos + 1..].to_string());
if let Some(q_pos) = before_fragment.find('?') {
path.push_str(&before_fragment[..q_pos]);
query = Some(before_fragment[q_pos + 1..].to_string());
} else {
path.push_str(before_fragment);
}
} else if let Some(q_pos) = input.find('?') {
path.push_str(&input[..q_pos]);
query = Some(input[q_pos + 1..].to_string());
} else {
path.push_str(input);
}
let (before_fragment, fragment) = match input.find('#') {
Some(pos) => (&input[..pos], Some(input[pos + 1..].to_string())),
None => (&input[..], None),
};
let (path, query) = match before_fragment.find('?') {
Some(pos) =>
(before_fragment[..pos].to_string(), Some(before_fragment[pos + 1..].to_string())),
None => (before_fragment.to_string(), None),
};

Comment thread payjoin/src/core/url.rs Outdated
Comment on lines +127 to +130
Some(ref h) if h.is_empty() => return Err(ParseError::EmptyHost),
Some(Host::Domain(ref d))
if !d.chars().all(|c| c.is_ascii_alphanumeric() || c == '-' || c == '.') =>
return Err(ParseError::InvalidHost),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense for these parsing errors to be caught in parse_host (would also be more consistent with the other parsing functions)?

Also, if we are erroring for ParseError::EmptyHost why does the host need to be an Option<Host> rather than just Host? This validation seems to ensure that the host will always exist. Not using an Option<Host> would also eliminate the need for has_host.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this simplifies the logic and any mutants created from host quite a bit

@benalleng benalleng force-pushed the url-rewrite branch 3 times, most recently from 4412ea6 to 0ce98ed Compare April 17, 2026 14:19
Copy link
Copy Markdown
Contributor

@xstoicunicornx xstoicunicornx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack 0ce98ed

Some small changes that I think would be good to have but don't think should block merging.

Native Url interface looks good. Ran tests and fuzzer.

Comment thread payjoin/src/core/url.rs Outdated
Comment on lines +348 to +360
match &self.host {
Host::Domain(s) => raw.push_str(s),
Host::Ipv4(octets) => {
raw.push_str(&format!("{}.{}.{}.{}", octets[0], octets[1], octets[2], octets[3]));
}
Host::Ipv6(segments) => {
raw.push('[');
raw.push_str(
&segments.iter().map(|s| format!("{:x}", s)).collect::<Vec<_>>().join(":"),
);
raw.push(']');
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match &self.host {
Host::Domain(s) => raw.push_str(s),
Host::Ipv4(octets) => {
raw.push_str(&format!("{}.{}.{}.{}", octets[0], octets[1], octets[2], octets[3]));
}
Host::Ipv6(segments) => {
raw.push('[');
raw.push_str(
&segments.iter().map(|s| format!("{:x}", s)).collect::<Vec<_>>().join(":"),
);
raw.push(']');
}
}
raw.push_str(&self.host_str());

Comment thread payjoin/src/core/url.rs

let url = Url::parse("http://example.com").unwrap();
let segs: Vec<_> = url.path_segments().unwrap().collect();
assert!(segs.is_empty());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra assert just to enforce path always holds / even when there are no segments.

Suggested change
assert!(segs.is_empty());
assert!(segs.is_empty());
assert_eq!(url.as_str(), "http://example.com/");

Copy link
Copy Markdown
Contributor

@DanGould DanGould left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love seeing the in-depth feedback and iteration with teamwork by @xstoicunicornx. Those changes seem sane to include in this PR, I'll have fast turnaround this week to review them, but I'm also into keeping it movin' and took a look now so felt like ACK was appropriate.

Seems like all of my comments were addressed except for the extra fuzzing suggestions which were indeed just extra suggestions utACK 0ce98ed

Comment thread payjoin/src/core/url.rs
Comment on lines +538 to +555
impl serde::Serialize for Url {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
serializer.serialize_str(&self.raw)
}
}

impl<'de> serde::Deserialize<'de> for Url {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
let s = String::deserialize(deserializer)?;
Url::from_str(&s).map_err(serde::de::Error::custom)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you have compared this to url::Url serializer? I'm 99.9% sure we've never used url crate's serialize_internal or anything other than url/serde feature which just does this but wanna ultra check

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the url serialize. Nothing too fancy, just with the nature of our payjoin::Url being as minimal as possible I would rather not have a serialize_internal I would Ideally like as minimal of an interface as possible. It seems unlikely that any user who needs serialize_internal would not already be importing url::Url to do it for them instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i definitely do not want serialize internal. What I was worried about was a compat issue. we don't have one 👍

Comment thread payjoin/src/core/url.rs Outdated
let url = Url::parse("http://example.com").unwrap();
let segs: Vec<_> = url.path_segments().unwrap().collect();
assert!(segs.is_empty());
// assert_eq!(url.as_str(), "http://example.com/");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this accidentally added as comment?

benalleng and others added 3 commits April 20, 2026 19:49
This commit implements the minimal native Url Struct and validation logic to be able to replace the Url dep from within payjoin excluding when utilizing any external Url interfaces like when using reqwest in the io feature.

Co-authored-by: xstoicunicornx <xstoicunicornx@users.noreply.github.com>
This commit migrates the monorepo away from the external Url dep to use
the new internal Url. Additionally due to the transition we need to add
a dep for url encoding with `percent-encoding-rfc3986` which
coincidentally get us inline with the bitcoin_uri crate.
Copy link
Copy Markdown
Contributor

@xstoicunicornx xstoicunicornx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack eeb65e0

Copy link
Copy Markdown
Collaborator

@spacebear21 spacebear21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK eeb65e0

@spacebear21 spacebear21 merged commit b654829 into payjoin:master Apr 21, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove url crate dep from payjoin

5 participants